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Abstract - A full custom, 25 MHz, 1.6 pm CMOS Correlator chip is presented. 
The 5.15mm by 4.23 mm chip performs either autocorrelation or crosscorrelation, 
consuming less than lOmW per channel. The correlator, designed for a space 
borne spectrometer, contains 32 channels. The 24 bit accumulator registers 
can be read independent of the input data path, in either 8 bit bytes, or 16 bit 
words. The device is cascadable and allows integration periods of up to 1.78 
seconds, at 25 Megasamples/second. The controllers, for the input data path 
and the data output section, are implemented with Sequence Invariant State 
Machines. 

1 Introduction 

A high speed, low power CMOS correlator chip is presented in this paper. The correlator, 
designed for a space borne spectrometer contains 32 time-lag channels, each of which 
contains a biasing multiplier, a 4 bit accumulator and a 24 bit counter. The sensing 
instruments provide the chip with two 2 bit input words, which can be either the same 
signal, for autocorrelation, or different signals, for crosscorrelation. The biasing multiplier 
does not perform binary multiplication, but implements a special function described in 
Section 4. External control of the correlator is quite simple, requiring only a reset pin and 
a pin to signal the end of an integration period. Simple handshaking is provided through 
a single output pin, which signals when data is available to be read from the output port. 
Output data can be read while integration is in progress, in either 8 bit bytes or 16 bit 
words, under the control of a user provided strobe. The correlator is capable of maintaining 
a 25 Megasample/second input data rate, with integration periods of up to 1.78 seconds. 
Data can be output from the chip at 10 MHz. The chip consumes less than lOmW/channel 
of average power. Auxiliary ports are provided for both of the data inputs. 

The data path of this chip is extremely regular, the initial layout of the core required 
only 30 hours to complete. The controllers for the input data path and the data output 
section are implemented with Sequence Invariant State Machines [1], and were initially 
layed out with a compiler described in [2]. This chip is amongst the first VLSI designs to 
utilize Sequence Invariant State Machines. 

Some of the main features of the correlator chip are listed below. 
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• Autocorrelation or Crosscorrelation 

• 25 Megasamples/second 

• 32 channels 

• Up to 1.78 second integration time at 25 MHz 

• Low Power (< lQmW per channel) 

• Cascadable 

• Selectable auxiliary ports on the data inputs 

• Integration can continue while data is output 

• CMOS and TTL compatible inputs 

2 General Description 

The correlator chip accepts two 2 bit data streams clocked at a maximum rate of 25 MHz. 
Delayed versions of one stream are multiplied with the current data of the other stream. 
Products for each delay (channel) are accumulated and the accumulator overflows are 
counted. This procedure continues for one integration period, as defined by a control line 
(INT) held low. Integration is performed continuously until INT is strobed high. At this 
time the overflow counters from each channel are isolated from their respective accumula- 
tors. After the counters have settled DATARDY going high signals that data is available 
for output. The overflow counters are cleared and are reconnected to the accumulators 
and a new integration period begins at this time. The contents of the overflow counters 
are output, under user control, through a 24 bit wide shift register after DATARDY goes 
high. Data output is either word serial or byte serial, under user control. When word serial 
mode is selected only the 16 most significant bits of each channel are output. DATARDY 
will remain high until all of the 32 output registers have been read, regardless of which 
output mode is selected. A test mode is provided to decrease the time required to test the 
onboard overflow counters. 


3 Functional Description 

3.1 Initialization 

The chip must be powered up with RN held low for at least 10 clock cycles, while OUTCK 
and INT are held low. This will bring the chip into a sanity state while guaranteeing that 
the output pads will be tristated. During this time the overflow counters will be cleared 
and the control state machine will be prepared for normal operation. Integration will begin 
on the clock following RN being brought and held high. The delay path shift register and 
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Figure 1: Functional Block Diagram 
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the channel accumulators can never be cleared. Figure 1 shows a block diagram of the 
correlator chip. 

3.2 Data Input 

Data will be input to the chip on the A and B data buses (A0,A1 B0,B1) every clock cycle. 
Although data will still be clocked into the chip, processing will not occur between INT 
being strobed high and DATARDY going high. Pins AO and A1 are the least significant 
and most significant, respectively, bits of the delay line. BO and B1 are the least significant 
and most significant, respectively, bits of the undelayed signal. 

Additionally, two auxiliary input ports (AUXA0,AUXA1 AUXB0,AUXB1) are pro- 
vided. These ports are multiplexed with the A and B buses, respectively, under the con- 
trol of the AUXINA and AUXINB pins. When AUXINA is held high, the A bus becomes 
the input port to the chip, when AUXINA is held low the AUXA bus becomes the input 
port to the chip. When AUXINB is held high, the B bus becomes the input port to the 
chip, when AUXINB is held low the AUXB bus becomes the input port to the chip. The 
auxiliary input ports behave identically to the primary ports. 

3.3 Correlation 

Correlation begins on the clock following RN being brought and held high or when INT 
is held low and DATARDY goes high (signaling that data is ready from the previous 
integration period). At that time each channel will multiply the data on the B bus with 
the output from it’s respective delay element. The product will be accumulated with the 
previous sum for that channel. Any overflow from the accumulator will be counted in the 
overflow counter of that channel. This process will continue until the INT signal is strobed 
high for at least 1 clock cycle. 

The multiplications are not purely binary in nature. The output of the multiplier is 
biased in a manner described in Section 4. The accumulator contains a four bit adder and 
four bit register. The carry out of the adder is the toggle signal into the the overflow ripple 
counter. The overflow counters are 24 bits wide, allowing for the count of up to 2 24 — 1 
overflows. The frequency of overflow is a function of sample frequency and the length of 
the integration period. 

3.4 Data Output 

At the end of the integration period, signaled by INT being strobed high for at least 1 
clock cycle, the overflow counters will be isolated from the processing elements. After the 
overflow counters have settled (10 clock periods, maximum) the contents of the counters 
will be dumped into an output shift register and the chip will signal that data is ready 
(DATARDY) on the output bus. When DATARDY goes high the first data can be read 
from the output bus on the next rising edge of OUTCK. When DATARDY goes high a 
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new integration period begins. DATARDY will remain high until all 32 output registers 
have been read, regardless of which output mode is selected. 

At all times, after the clock starts, data will be output from the end of the delay line 
at pins AOUTO and A0UT1. AOUTO is the delayed version of AO and AOUT1 is the 
delayed version of Al. These pins may be used in a cascaded system by connecting them 
to the AO and Al pins of the next chip downstream. 

Data output is, optionally, word serial or byte serial. Holding the output control signal 
(BYTE) high during the output phase will output the 24 bits of the counters in 8 bit bytes, 
most significant byte first. Pins DO0-DO7 will be used (DOO being the least significant bit). 
Pins D08-D015 will be tristated. Holding BYTE low will cause the 16 most significant 
bits of the counter to appear on pins DO0-DO15. 

The output pins DO0-DO15 will be tristated whenever DATARDY is low or when 
DATARDY is high and OUTCK is low. 

3.4.1 Word Serial Mode 

When BYTE is held low (word serial mode), successive output words will be clocked out 
by the rising edge of OUTCK. During the low half of the OUTCK cycle the output bus 
will be tristated. OUTCK has a minimum frequency of 0 Hz and a maximum frequency of 
10 MHz. OUTCK periods do not have to be of equal length and the duty cycle need not 
be 50%, but a minimum pulse width of 44ns is required. 

During the output phase the output data shift register will be cleared. The output 
clock must be strobed 32 times to unload the output shift registers. Data on the A and B 
buses continues to be input to the chip during the output phase. 

Data output is terminated by bringing and holding OUTCK low after reading out all 
32 channels. 

3.4.2 Byte Serial Mode 

When BYTE is held high (byte serial mode), successive output bytes will be clocked out 
by the rising edge of OUTCK. During the low half of the OUTCK cycle the output bus 
will be tristated. OUTCK has a minimum frequency of 0Hz and a maximum frequency of 
10 MHz. OUTCK periods do not have to be of equal length and the duty cycle need not 
be 50%, but a minimum pulse width of 44ns is required. 

During the output phase the output data shift register will be cleared. The output 
clock must be strobed 96 times to unload all of the output registers. This option allows 
access to the 8 least significant bits of the overflow counters, as well as providing an 8 bit 
data bus. 

At the end of the output phase the chip will be ready to begin a new correlation. Data 
on the A and B buses continues to be input to the chip during the output phase. 

Data output is terminated by bringing and holding OUTCK low after reading out all 
32 channels. 
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3.5 Test Mode 

Test mode provides a method for checking the operation of the 24 bit overflow counters. 
Test mode is entered by bringing and holding TEST high, while in an integration period. 
Test mode breaks the overflow counters into three 8 bit counters, the inputs to which 
are the overflow bit of the adder in each channel. An appropriate input pattern must be 
applied to the A and B buses during test mode operation. Access to the counters is the 
same as during normal operation. 

4 Biasing Multiplication 

The multiplication to be performed takes two 2 bit input words and forms a 3 bit product. 
The mapping of data is described in Figure 2. BO and B1 are the real-time inputs. AO 
and A1 are the delay line inputs. PO, PI and P2 are the product outputs. 
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Figure 2: Biasing Multiplication Truth Table 


5 Data Path Design 

Each of the 32 channels integrated on the correlator chip are identical. A single channel 
consists of 2 delay elements for the time-lag input signal, one biasing multiplier, one 4 bit 
adder with a 4 bit accumulator register and a 24 bit counter. 
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Figure 3: General block diagram. 


The overflow counter, which stores correlator integration values is an asynchronous 
ripple counter. The asynchronous design reduces the power requirements of the chip. The 
biasing multiplier is implemented as n-transistor pass network [3, 4,5, 6], which yields a very 
dense, regular, combinational logic network, while providing the operational speed required 
by the 25 MHz clock frequency. The four bit adder is implemented as a Transmission 
Gate Conditional Sum Adder [7,8]. This adder provides the performance needed while 
minimizing both capacitive loads and silicon area required to implement this function. 
In general, the use of pass transistor networks reduces nodal capacitance, which is an 
important consideration in low power applications. Each channel also contains two pipe 
registers, between the carry out of the adder and the input to the overflow counter. These 
registers serve two purposes. First, the propagation path would be too long for a 40ns clock 
period and second the registers provide a means for isolating the ripple counters from the 
input data path. At the end of an integration period, the counters must be allowed to 
settle, before output data is ready for reading. All registers are static. 

6 Controller Design 

The correlator chip requires two controllers. The input data path controller maintains the 
state information required by the 32 channels. The input data path has two main states. 
The first state is normal integration and the second is the preparation of integrated data 
for output. The output controller provides the control of the output shift register and the 
formatting of the data sent to the output port. 

Both controllers were designed using Sequence Invariant State Machines [1]. The logic 
of such state machines is invariant with respect to the actual sequence required. Only the 
number of states and inputs need to be known to specify the logic. 

A general block diagram of a Sequence Invariant State Machine is shown in Figure 3. 
The logic that forms each next state equation, Yi, consists of a storage device (a D flip-flop), 
next state excitation circuitry, a Binary Tree Structured (BTS) network, which generates 
the next state values to the flip-flop, and input logic consisting of a pass transistor matrix. 
Present state information is fed back to the next state logic and input information drives 
the input switch matrix. 

A general BTS network is employed to formulate the next state equations for sequential 
circuits. This general BTS network represents a complete decoding of an input space and 





3.3.8 


hence only constants are input to the network. Any specific function can be realized by 
simply changing the pass variable constants, 1(0), at the input to the appropriate branch. 
The input matrix is programmed with appropriate connections to 1(0) to produce the 
desired state transitions. Changing the sequence of operation merely requires a repro- 
gramming of these connections. For the correlator state machines, the input switch matrix 
was eliminated by applying the inputs, J, as pass variables to the BTS network. Work to 
produce a general theory for this logic reduction is currently under way at the UI NASA 

SERC. 


6.1 Input Controller 

During integration, the input state machine connects the 24 bit overflow counters to the 
calculation section, while disconnecting the output shift register. At the end of an inte- 
gration period, signaled by INT going active, the controller moves through a fixed set of 
states. The state machine first isolates the overflow counters from the calculator. The 
machine then steps through a number of states while the ripple counters settle. At that 
time the counters are first parallel loaded into the output shift register and then reset. The 
state machine then reconnects the counters to the calculator, beginning a new integration 
period. This controller also sets a latch which signals that data is ready for reading. 

The use of Sequence Invariant State Machines proved invaluable in this application. 
At the time of initial logic design, it was unknown how long it would take for the ripple 
Counters to settle. It was desirable to minimize the length of time that the counters were 
disconnected, as this time shortens the integration period. The state machine was initially 
designed with a number of wait states which was deemed sufficient. After circuit design 
was finished on the counters, several states could be removed. The redesign of the state 
machine required about 10 minutes of engineering time and about 10 minutes of layout 
time. This is a significant improvement over traditional state machine designs. The output 
equations of the signals required to control the data path are formed by logical blocks which 
are identical to those in the state machine itself, as described in [9]. Again as the number 
of states changed, the output equations were also easily modified. 

The input state machine requires only two external signals, INT and RESET, for proper 
operation. 

6.2 Output Controller 

The output state machine provides signals which control the output and formatting of 
correlated data. This chip has two modes for data output, byte serial mode and word 
serial mode, which are selected with the BYTE pin. When the circuit is in byte mode the 
24 bit counters tire read out in 8 bit bytes. Word mode sends only the upper 16 bits of 
each register. Output is under user control. New data appears on the output pins on the 
rising edge of OUTCK. When OUTCK is low the output pins are tristated. Additionally, 
when in byte mode, the upper 8 bits of the output port are tristated. When all data has 
been read the DATARDY flag is reset. 
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The output controller, therefore, must control several operations. Byte mode requires 
that the three 8 bit portions of the overflow counters be multiplexed onto the lower 8 
bits of the output port. The mux control signals are formed by the output controller. 
Additionally, the tristate signal, for the upper 8 bits of the output port, is provided by 
this machine. A seven bit counter is decoded for either byte or word mode. This counter 
maintains a count of the amount of data which has been read out. When the output shift 
register is empty, the DATARDY flag is reset. This controller also provides the clock for 
the output shift register itself. 

Again, as in the input controller, Sequence Invariant State Machines were utilized in 
this controller. As logic design progressed, inevitable changes were easy to implement with 
these functional blocks. 

This controller requires three external signals. BYTE to signal which output mode is 
active, DRI which is the data ready signal from the input controller and RESET. 

7 Layout 

The mask design of the correlator chip was straight forward as the structure is extremely 
regular. The base cells and the layout of the correlator core required only 30 hours of layout 
time. The core of the chip, the 32 correlator channels, contains 31,948 transistors. The 
n-transistor to p-transistor ratio is 1.77, which reflects the extensive user of n-transistor 
pass networks. The silicon area consumed by the core is 3.49mm by 2.52mm, which yields 
a density of 275.3 pm 7 /device. 

The layout of the Sequence Invariant State Machines was done with a pre-released 
version of the silicon complier described in [2]. The correlator chip, as a whole contains no 
more than 120 random devices. Figure 4 is a plot of the correlator chip. 


8 Summary 

A 25 MHz CMOS correlator chip has been described. The chip provides either crosscorrela- 
tion or autocorrelation of 2 bit input signals at a data rate of 25 Megasamples/second. The 
32 channel chip, designed for space applications, consumes no more than lOmW/channel. 
The VLSI circuit has two options for data output and provides a simple handshaking 
scheme. The layout of the correlator is highly regular and has taken advantage of Se- 
quence Invariant State Machines, in the controller design. 
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Figure 4: Correlator Chip Plot 
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